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ABSTRACT 


This paper is an investigation of a goodness of fit 
test for bivariate normal distributions. The test pro- 
cedure is based on random linear functions of bivariate 
normal random variables. The test makes use of the maximum 
Kolmogorov D(M) statistic over the linear functions which 
are computed. An estimate of the distribution of M is 
obtained by Pe ition Simulation. No attempgyis made Ge 


determine the power of the test. 


TQPADUATE SCHOOD 


_ CALIF. 93940 
TABLE OF CONTENTS 
i, ThPRODUQUTON 22-22 - ji 
L060 WHECLY 663. oe 10 
lil. HWE URC’, BRSUUI: 33 18 
Pe Te me REAB hE GENERATION ---------------._- ~~~ 23 
V. i oe ee eumsleN Ss ———-—-——----------____-___ a 
COMPUTE, PROC) 22-35 32 
° 8 OCR MRER, 2256666. 6 39 
i iii en ie mens: ————-------------_-~-_-_--__- 40 
POR DD) 2UGS 2-6-6665 Ta 





LIST OF TABLES 


ik Distribution (Relative Frequency) of 
Pomiororevy— mirnoy Statistics that Result 
from 100 Linear Combinations of 100 
Bivariate Random Vectors from Two 
Bivariate Normal Distributions ------------~---- 29 


ile. Distributions (Relative Frequency) of 500 
Maximum Kolmogorov-Smirnov Statistics, 
Bach of which was Derived from 25 
Linear Combinations of Samples from 
UT woielomiimetciate Normal Distributions --——-———-— 30 


ITI. Distribution (Relative Frequency) of 500 
Maximum Kolmogorov-Smirnov Statistics 
for Samples Drawn from Various Bivariate 
NWoriidiatesrtroublons with Identical 
Correlation Coefficient, p -------------------- 31 





een DUC TION 


Mirae lreaviwon Of Statistics is to attempt 
Pemtina a specific probability distribution which fits an 
observed sample of data. A well fitting distribution can 
then be used to predict values of future occurrences, 
relative frequencies of future occurrences, etc. There 
are numerous methods available for testing the goodness 
of fit of data in scalar form to a hypothesized univariate 
probability distribution. Among these are the Chi-square 
and the Kolmogorov-Smirnov tests, of which the Kolmogorov- 
Smirnov test is considered the more powerful [1]. Further- 
more the Kolmogorov-Smirnov test exhibits the very 
Bereacrive Cilareacveristic that it is based on a statistic 
which has a distribution of the random variable being 
sampled. 

However, there appears to be a lack of methods for 
vesting the fit of hypothesized multivariate distributions 
to multivariate data (observations in vector form). 
Furthermore, no statistic which has the desirable character- 
istic of being distribution free and which can be used in 
multivariate goodness of fit tests has been found. [In 
mieou, Mo SUCH staLisvic May even exist. For instance, 
Simpson [2] has shown an example of continuous bivariate 
distributions for which the analog of the Kolmogorov- 


Smirnov statistic is dependent on the underlying dis tribution. 


Rosenblatt [3] discusses a possible test which involves 
a transformation of an absolutely continuous k-variate 
distribution intovthesunittorm dist ribet Onc oman 
k-dimensional hypercube. The transformation is uniquely 
determined by the theoretical distribution against which 
the sample is to be tested. Then the transformed sample 
may be tested against the uniform distribuiacnein 
k-dimensions. There are several disadvantages to this 
procedure, however. For example, the results are influenced 
by the manner in which the components of the observed 
vectors are ordered. 

The purpose of this paper is to describe a goodness of 
fit test for testing a bivariate normal distribution (eee 
given mean and covariance matrix) against samples of bi- 
component data. The test results in acceptance or rejection 
of the hypothesis that ar = F, where Pa is the cumulative 
distribution function of the population of the bivariae 
sample vectors and F is the hypothesized cumulative distri- 
bution function. The notation in this paper closely fou. 
the notation used by Anderson [4]. It is expected that the 
test developed here for the bivariate normal distribution 
can be extended to the case of the k-variate normal distri= 
bution. This paper is restricted to a consideration as 
testing the fit of samples tosa distribution which bas 
zero mean. This restriction causes no loss of generality 
Since any distribution with a finite mean can be translated 


to mean zero by a linear transformation. 


The goodness of fit test was developed according to the 


following procedures: 


1) a characterization of the bivariate normal distri- 
bution is used to develop atest statistic M for use ina 


peodness™"or fittest, 


2) the distributional properties of M are investigated 


by computer simulation. 


Lie DHE Cin 


Since there appeared to be no widely known statistic 
for a reasonable goodness of fit testmfor multivariare 
distributions in general, and the multivariate normal 
distribution in particular, it seemed plausible that a 
statistic suitable for a goodness of fit test might be 
found by considering characterizations of the multivariate 
normal distribution. 

One property which characterizes a multivariate normal 
distribution is given in the following theorem [4]: 

Theorem 1. <A p-dimensional random variable X has a 

p-variate normal distribution, if an only if every 

linear function of X has a univariate normal distri- 

Ui areas 
The parameters of the univariate normal distribution can be 
computed according to theorem 2. 

Theorem 2. Let X (a column vector with p components) 

be distributed according to Nous) a multivariate 

normal distribution with mean (vector) u and covari- 
ance (matrix) 2, and let C Dewar trow vector oe. 


constamts. Then 
Y = CX 


is distributed as univariate normal with mean Cu and 


variance CrIC' (C’ is the transpoceuotwe 
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(NOTE: CX can be described as a linear combination of the 
components of X.) 

From the characterization of the multivariate normal 
distribution given in theorem 1, it was felt that a suitable 
goodness of fit test procedure might be to test the result 
of a linear combination of the sample vectors (whose distri- 
bution had been hypothesized as a specific multivariate 
normal distribution) against the hypothesized theoretical 
univariate normal distribution which has been computed for 
the particular linear combination. Thus the problem is 
reduced to the univariate level and use can be made of well 
known univariate statistics which provide acceptable good- 
nessworerit tests. 

However, UCheewem | states» that every linear combination 
of multivariate normal random variables must De univaerlace 
normal. Obviously one linear combination will not suffice 
for a reasonable test. It is not difficult to envision 
that there exists some linear function of nearly any vector 
sample which will transform that vector sample into one 
Wien Wseewcoeuca as Univariave normal.” In faet, if the 
marginal distribution of the components are univariate 
meatal, bub the joint distribution iswrot multivariate 
normal, the linear combination consisting of one component 
Wee. = OX> .. 4+ OX, =X,) ts uniwariate normal.” Thus 
a test which uses only one linear combination might be 


manipulated by the tester to give any results he desires. 


Aran 


On the other hand, it is clearly impossible to compute 
every linear combination of a sample. As a compromise, it 
was felt that a number (to be determined) of randomly 
selected linear combinations would serve as a representa- 
tive sample upon which an overall test statistic might be 
based. To produce random linear combinations, the (column) 
sample vectors were multiplied by a (row) vector of random 
constants. The random components of the ‘multiplying 
vectors' were drawn from the uniform (0,1) distribution. 

A uniform (0,1) distribution for the random multipliers 
was used because: 

1) Up to multiplicative constants, essentially any 
linear combination of the components of the multivariate 
vector could be produced using coefficients from the uniform 
Omsliadius103 bu telioimy, Banna 

2) A component of a random multiplier was equally 
likely to be contained in any one interval in (0,1) as in 
any other interval, provided the intervals were of equal 
length. Thus there should be no specific intervalecontaam-— 
ing @ ‘coneentration' of the mudsiapiiie ps ewhisehammige ian 
adversely influence the performance of the goodness of fit 
test. 

NOTE: The results of the goodness of fit test described 
in this paper, using random multipliers from a uniform (0,1) 
distribution were the same as results obtained using random 


multdpbiters, froma, uni ferme(=? g2) «deicier i Gite oie. 


The Kolmogorov-Smirnov test was employed to determine 
acceptance (or rejection) of the hypothesis that the 
linear combinations of bivariate sample vectors are from 
the (computed) theoretical univariate distributions noted 
in theorem 2. As noted previously, the Kolmogorov-Smirnov 
test is considered more powerful than the Chi-square test. 
OY course, the distribution free characteristic of the 
Kolmogorov D statistic applies in particular to linear 
combinations of the components of multivariate normal 
rendom variables. <A description of the Kolmogorov- 
Smirnov test is presented in the following paragraphs. 

One method of testing the simple hypothesis, H: Fy = F, 
where By ifoutwreomewmulatiume Gdistribution of the population 
sampled and F is the theoretical continuous distribution 
puepescaster thespopulation, isthe Kolmogorov D statistic 
toe The aeymphetaic distuwibutionofeD was investigated by 
Kolmogorov and tabulated by Smirnov [6] and, for small 
sample sizes, by Massey [7]. 

The Kolmogorov D statistic is derived from the sample 
Comemative distribution function, Sys and the proposed 


wTroeorev lee cumulative distribution function, F, as Tollows; 


Let Yioeeet 


Veeimcmmmllbalive distribution function F. Let Z 


N be a random sample from a continuous population 
yo Sy be 
the ordered statistics of Y, so that 


KS 


The sample cumulativesdistributcsonkeimetieess 


0 x < Zs 
Syy (x) = j/N be a ners J= eee eee 
it Keen Zn 


The Kolmogorov D statistic is defined as 
D = Sup | Sy (x) Se) | 
x 


and can be described roughly as the maximum deviation of 
the sample cumulative distribution function from the tree 
posed theoretical cumulative distribution funetion. (iii 
D statistic has the property that its distribution dae 
not depend upon the underlying distribution F. Clearly, 
it is dependent on the sample size N, because the sample 
Cumuteati ve. diks rei Dui ei taUieesno ts, Sy takes as values only 
multiples of 1/N. Naturally the D statistic approaches 
zero, almost surely, as N becomes large without limit. ome. 
viding the sample is actually from a population with 
distribution F. Critical values, T, of the D stavistitewiae 
obtained from the tabulated distributions and are used with 
a sample to determine acceptance or rejection (reject if 
D> TT) of the hypothesis that By = F, 

One value of the Kolmogorov D statistic is derived aaa 
each linear combination of a bivariate sample of vectors, 
For exemple, len <= (X,5-+.Xy) where X, = (X54 2X40) and 


Aas is a scalar, be a random sample of size N of bivariate 
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random vectors. A linear combination Y = CX, where 
C = (C55) and c, is a scalar, is a vector of N scalars, 


dy Yu) > Let Z be the hypothesized covariance matrix 


y>° 
Pimciomoict mibutdonwof XX... To, ebtain a rough test of the 
hypothesis that the distribution of X is bivariate normal 
with covariance matrix 2 (and mean zero), one may test the 
hypothesis that Y = CX is distributed as univariate normal 
with variance CZC' (and mean zero). A Kolmogorov-Smirnov 
test to determine the acceptance of the hypothesis when 
SWeepamercular value of C, say Cs» is used to compute a 
linear combination will yield one value of D, that is 


D = Sup |Fy(y) - Sy(y)| 
» 


where Be is the hypothesized univariate normal cumwmlative 
distribution of Y and Sy is the sample cumulative distribu- 
tion of the transformed sample. When the procedure listed 
above is repeated for a different value of C, say Cy» then 
another value of D, which may or may not be identical to 
Game first value, is obtained. 'If every value of D obtatned 
with various values of C is less than the critical value of 
D for the given sample size and level, then the hypothesis 
is accepted. Likewise if every value of D obtained from 
using various values of C is greater than the critical 
value, then the hypothesis is rejected. However, some 


linear combinations of typical samples from a bivariate 


normal population can be expected to give values of D 


te) 


exceeding the critical value while other linear combinatizions 
may give values of D less than the critical walnewwise 
eliminate the ambiguity of such results, another statistic 
must be used, preferably one whose distribution fiicumes 
canebe readily tabulated or computed. For this’ purpose ye 
use the maximum of the D statistics, M,@@eri9ee i en 
Kolmogorov-Smirnov tests of a large number, m, of linear 


combanatienspof. the samp lox harese 


= % . 
M=max sup Sa x pie Bey) i. = 1. 2a 
a y al 
where So x is the sample cumulative distribution funevsed 
als 


Of the linear comoim2 ion C,X and F,(y) is the hypothesized 
theoretical cumulative distribution of the linear combim@e= 
tion C,X. 

The values of the D statistics derived from linear com-— 
binations of a particular sample appear to be more high 
dependent on the sample than on the random multiplying 
vectors (see Section III, Empirical Results). Therefore 
the maximum D might be expected to be a result of only the 
sample so that rejection or acceptance’ of the hypovtituae 
pe By = F, where By is the cumulative distribution funevres 
of the population sampled, and F is the proposed theore , tee 
cumulative distribution function, would depend only On=tim 
bivariate sample. 

The distribution of M, as defined above, appeared™ve 


be intractable to get in closed mathematical form? Hovever, 
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the properties of the distribution of M were investigated 
for several cases by examining empirical data obtained from 
computer simulation. The data was produced by generating 
Samples of a specified bivariate normal distribution, com- 
puting random linear combinations of the sample vectors, 
and recording the maximum of the resulting Kolmogorov D 
statistics. This procedure was repeated to give several 
mists of, 500 valtes of M. Each list of 500, values of M 

was derived from bivariate sample vectors with different 
wmderilying bivariate normal distributions. The empirical 
results and generating techniques are discussed in Sections 


iT aviceDY . 


ale? 


LLL... EMP LRG AL RES iis 


In order to -obtain*®the empirical dava co suc, mas 
distribution of M, the maximum D statistic, a computer 
program was written to accomplish the following for 
specific selections of Une Cover lamee ela lice ame 

1. Generate a sample of desired size of bivariate 
normal random vectors from the given distribution, 

2. Generate the desired number of multiplying vectors, 
each of which would produce one linear combination of the 
sample vectors, 

3. Compute the linear combinations of the sample 
vectors by vector multipl cage, 

4, Perform a Kolmogorov-Smirnov test on each univariate 
sample obtained as a result of a linear combination and 


record the resulting value of D. The values of 


D = sup |S (x) SG) | 
x 


when Sy and F are functions previously defined, were 
determined at values of .0O1K (K = 1,...100) for the proposed 
theoretical distribution F. For example, the value of the 
sample cumulative distribution, Sy x) 5 was evaluated at each 


point x, where F(x, ) was multiple of .01, D being assumed 


1 


to be the maximum value of the 100 differences 


| Sy (x, ) ~ F(x, ) | 
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5. Record the maximum value, over the linear combina- 
Moms swemnormmed On cach sample, of the D statistics pro- 


Guced by each particular sample. 


The initial simulation procedure generated sets of 100 
mamcom veclLors from a bivariate normal distribution with 
COVariance matrix 2. One hundred 'random' linear combina- 
tions of each set of vectors were computed and the D 
Statistic derived from each linear combination was recorded. 
The results of this simulation indicated that the D sta- 
tistics for a given sample were grouped within an interval 
eperox mately .05 units in length, but the location of the 
interval was dependent upon the particular sample. This 
phenomenon suggested that the value of D is dependent on 
the sample to a higher degree than it is on the linear 
fMmerron Used. Ihe results of five such simulations for 
samples from each of two different bivariate normal distri- 
butions are summarized in Table I. Note that with sample 
number lb, several D values exceeded the univariate 
Kolmogorov-Smirnov critical value at the .05 level of sig- 
nificance. Thus, using the univariate Kolmogorov-Smirnov 
critical value, the hypothesis that the sample was from the 
underlying distribution from which it was generated would 
have been rejected for some linear combinations and 
accepted for others. But using a critical value (determined 
by level of significance and sample size) for maximum D 
would have eliminated the ambiguity. 

It was also found that the relative frequency, within 


omens BiakwOol Jength .01, of the D statistics remained 


nearly constant as the number of linear combinations was varied. 


hg 


The data produced by the simulation procedure described 
above led to the consideration of using the maximum D 
Statistic for testing a bivariate sample against the pro- 
posed bivariate normal distribution. This simulation data 
indicates that the maximum D statistic derived from random 
linear combinations of samples from a bivariate normal 
population had the desirabile characteris (ies sige 

1) <A unique maximum is obtained for each sample, 
independent of the random multipliers, provided a sufficient 
number of linear combinations sarencomput cd. and 

2) The maximum value is obtained from various linear 
combinations, at least one of which could be randomly 
selected, with high probability, in as few as 25 trials 
(selections) of multiplying vectors. In all cases investi- 
gated, including those listed in Table I[, the same value of 
M was achieved over 25 linear combinations as was achieved 
in 100 linear combinations for each particular sample. 

As noted in Section II, the exact distribution of Mowas 
found, to be intractable. Therefore, am order sto siuag 
some,of the characteristics of the distribution or Ws 
another computer simulation procedure was used to produce 
a large sample of M. The simulation procedure may be 
described as follows: 

1) <A sample of 25 vectors was generated from a pre- 
determined bivariate normal distribution. since Derame 
therefore M, are dependent on sample size, it was recog- 


nized that data obtained by this simulation would pertain 
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to samples of size 25 only. However, one might expect the 
GharacGertsties of Che distribution of M tobe similar for 
all sample sizes. 

2) Twenty-five randomly selected multiplying vectors 
were generated so that 25 linear combinations of each 
sample were produced. The maximum D over the resulting 25 
univariate samples was recorded. (From the initial simula- 
tion, it was expected that 25 linear combinations would 
produce the maximum D for any sample.) A total of 500 M 
statistics, all derived from the same underlying bivariate 
normal distribution, were produced. 

The simulation procedures were repeated for different 
por miower Values or the underlying bivariate normal aistri~- 
DUuUtHOoY to "produce five sets of 500 statistics. Thus, cach 
set of 500 values was derived from linear combinations of 
samples drawn from a different bivariate rormal Gistraibeytion. 

The results of the simulation described above are sum- 
marized in Table Il. Unfortunately, it appears that there 
is not a simple statistical relationship between the distri- 
butions of the M statistics obtained with the samples drawn 
from different bivariate distributions. And, of course, 
if each different underlying distribution (of the sample) 
Pproduccswa Cdilrferenvvaistribution of the™ Statistic, it 
would be impossible to tabulate values of all distribution 


Perco loons” of 'M. 


aul 


It did not seem unlikely that a similarity or other 
relationship existed between the distributions of M 
statistics obtained from samples which were derived from 
bivariate normal distributions with identical correlavion 
coefficients. Therefore, a final simulation procedure was 
repeated, using samples drawn from several non-identical 
bivariate normal distributions with constant correlation 
coefficients. The results are summarized in Table IIl. 

Although there is a notable similarity between the 
distributions of M statistics derived from bivVaryiave waar 
distributions with identical correlation coefficients (9), 
the hypothesis that the distributions are identical was 
rejected by a Kolmogorov-Smirnov test at the .05 level 
of significance. ‘"Phis is also readily tapparentefor tite 
case in which op = .3162. Note that the difference in the 
means of the samples of M is .0059, whereas one standard 
deviation of the mean (computed from the sample standard 
deviation) is approximately .0022. Thus the means are 
nearly three standard deviations apart, which suggests that 
Hhaice dissiirdaawat »Oons wares Nee The Same, 

An interpretation of these results and how they may be 
applied to a possible goodness-of-fit test for the bivariate 
normal distribution is discussed in Section V, Summary and 


C orrGuaedseaiaise: 


ee 


IV. RANDOM VARIABLE GENERATION TECHNIQUES 


iMmercder tO sSsvuudy the distribution of the M statistics 
described in Sections II and III, it was necessary to 
produce a large number of random variables from various 
bivariate normal distributions. There are several possible 
methods which might be used to generate the bivariate 
normal random variables on a computer. One method would 
be to generate independent normal random variables and 
perform an appropriate transformation on them which will 
produce a bivariate normal random vector. For example, to 
generate random vectors from a bivariate normal distribu- 
tion with mean zero and covariance matrix 2%, where 2 is 
Symmetric and positive definite, one could use the follow- 
ing procedure. 

1) Generate two independent random variables, from a 
omar (0,19 distribution, so that X = (X, »X5) is bivariate 
normal (0,1) where I is the identity matrix. 

2) Perform the transformation Z = CX, where C satisfies 
ec' =f. Then Z = (2,525) is bivariate normal (0,2). 

In this study the bivariate normal random vectors were 
generated using a conditional distribution approach. It 
ice a wediwimounmchomacteristic of the bivariate normal 
Ciewmribubdem that if X = (X, 5X.) is distributedrbiveriare 


normal (u,2) where u= (U1 Uy) and 


a5 


fe) 0) 


We 2 


C ilies  2e 


then X, is distributed univariate normal (U5 96 It 


a 22)° 


is alsio-welleknownetThat »the ond t Ponalmarstriputaernem 


Xx given X, = Xo ts univariatverriormnal 
ol =a 
(X5-Uy) 9944-9 9%9 Inq) 


[My #955955 
froma univamiiate normal (U5 2855) distribu- 


1? 
Therefore, after 
generating KX, 
tion, salae wonditd onalada Strum ied OnmeT Xo> given X, = X5> 


was eomputed) and X, was then generated from that Undvarweee 


il 
NON AG leuGaigeweags DUTION ; 
To verify that this produces a@ random vector with ve 


characteristics of the given bivariate normal distribution, 


cOeuigder the fel lowe: 


Let US (O50 = (Uy >H>) 
Generate X, = x, from its mane a lesides tenes aaa ae N(0,;055)- 
Then, 
E(X,) = 0 
and 
al fo 
E[(X, ae Hs) ] aa E[(X,) } oa O59 


Generate V, independent of Xo» from univariate normal 
( Omlomad@iiet ri bation . 
JOwW 

E(V) = 0, and 


r) 


17 (Vo) = ]} 
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fi -1 os -1 
Newelet X7 = (054-94 5%55 T5,) Y = F595 (X,). 
f : ‘ -1 -1 
X, is univariate normal (055555 (X,); 0447-57 0%90 J54)>s and 
= ail aL oa a 
ee Sain 122 ape an) Vat F190 (Xo) Ter OL = aig 
Similarly, the covariance between Xs and X5 is 
Cov(X, ,X,) = E[(X,-u,)(X%,-u,)] = E(X,X,) 
Sg Coreen es 7 )2 Vto Oe )1X, } 
1 | eee ee ee” 22 
COMM aa.4: 0 aie )2 E(V°X.,)+o0. ,0 =u BAX xy) 
2 22 | 2 2 ae o—_ 


Since V and X, are independent, 


2 
B(V - X,) = ECVJE(X,) = 0. 
Continuing from above, 


2 sal _ z 
eo) = 9 * %12%> (Fan) = Tym = Say 


The variance of Xo is 


E((X, - u,)°) = B(xS) 


a ab 


1 <2 =. a 
BICO7 5-97 2% 5 Fn1) V Mo 5859 Ca 


-1l 


(955-97 9%9 F%1) YEW) 


2 
ECV" )+(555-91 59 53 O57 


= 2 
(945955 *) E(X,) + (015559) E(X°) 


-1 -l = 
117912%)0 9 + ag O = 6 


= @ Dil Lao. “Fe 


lh tla 


Since O = Oo 


WZ All 


a 


Incidentally, this technique can be extended to a method 
of generation of p-component multivariate normal random 
variables, since the distribution of Mae 5 ee given 
any x = X,(jH1,2,..-05 j#i), is also a normal distri- 
bution whose parameters may be computed. 

Standard computer routines were used to generate the 
univariate normal and uniform random variables required for 
the simulation procedure previously described. The routines 
are shown in the computer program under Subroutine RANDU, 


(for uniform random variables) and Subroutine GAUSS (iam 


univariate normal random variables). 
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V. SUMMARY AND CONCLUSIONS 


The empirical data indicates that the distribution of 
the maximum of the D statistics (M), derived from Kolmogorov- 
Smirnov tests of linear combinations of samples from 
bivariate normal distributions, was dependent upon the 
eevariance matrix of the underlying distribution of the 
sample. Therefore it would be impossible to tabulate the 
distribution of M except for specific parameters of the 
timMecriving distribution. 

However, a goodness of fit test for the bivariate nor- 
Secs lrepuUolOn Can be constructed using the M statistic. 
the test might censist of using a simulation procedure, 
similar to that used in this paper, to produce a sample 
Gistribdution of M. This distribution of M would be derived 
from samples which are from a bivariate distribution 
identical to the proposed hypothesized bivariate distribu- 
ClOM, then saueeter tical value of M, for .a test with.level 
of significance a = a5. may be established as the value at 
which the (1 - a) percentile point of the distribution 
of M occurs. Obviously the number of linear combinations 
and the number of M statistics for development of the 
sample distribution of M must be determined by the experi- 
mentor performing the test. (Note that the size of the 
semples generated in the simulation procedure must be iden- 


tical to the size of the sample to be tested. 


ad | 


For example, suppose one wishes to test the hypothesis 


that a sample of N vectors was drawn from a population whose 


ee 
az 1)? 


(one of the distributions for which M is tabulated im ieee 


distribution is bivariate normal (0,2) where © = ( 


IT). If N = 25) then the distribution of the M™statisgme: 
is shown in Table II, listed under the appropriate covari- 
ance matrix. For qa = .05, the critical value of Mise 
the value at which the .95 percentile point of the distri- 
bution occurs. Then 25 random linear combinations of the 
Sample vectors would be computed and the 25 resulting uni- 
variate samples tested against the computed univariate 
distribution by a Kolmogorov-Smirnov test. If the maximum 
of the 25 D statistics thus obtained is greater than J3ee 
the Bypocvnes far me rejectéd.” Otherwise the hypothesis is 
accepved. 

There are obviously many interesting aspects concerning 
this (and other) multivariate goodness of fit tests which 
should be investigated. For example the power of the 
test described in this paper, when applied to samples from 
distributions other than the bivariate normal, might be 
investigated. Also, a goodness of Tit tese based omea 
statistic other than M (e.g., the mean or variance Grae 
obtained from linear combinations of the sample components) 
might prove to be interesting. It is, of course, desi 
to find a "reasonable" statistic for which the distripuvaen 


may be found and tabulated. 
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TABLE I 
Distribution (Relative Frequency) of Kolmogorov-Smirnov 


meicrecics (D.) that Result from 100 Linear Combinations 
of 100 Bivariate Random Vectors from Two Bivariate Normal 


Parstrabutions 
PE Fs eo: . wm 2 
= [33 = [P| 





aoe SAMPLE| NUMBER 

EwaListic 2a Ob ee ae oe 
POH 2 
05 \ 2 ee 108 36 
06 Ll 50 66 35 
oy 26 Pp 2h 9 2am «(19 8 
08 6 22 5 17 7 28 
09 44 20s 22m 19 36 6 
mo om 17 8634 32 9 18 31 
Tied \ 29 10 29 
Pe 9 20) 
goles 3, 14 
14 11 
15 


NOTE: 1) The critical value of the univariate Kolmogorov- 
Sieoey Statistic for this sample size and 
= 705, is 1356. 
2am oovarience Matrix of the distribution frem 
which the sample was drawn. 


ao 


TABLE If 


Distribution (Relative Frequency) of 500 Maximum Kolmogorov- 
Smirnov (M) Statistics Each of which was Derived from 25 
Linear Combinations of Samples from Various Bivariate Normal 
Dist riemc Pomc 


Covariance Matra % of. Waste Oo lls ons 
Population from which Samples were Drawn 





Range of 
Max. K- 7 
i: 5 3 Le 1 ap) Mt -1.8 5. i 
Statistic (M) 3 4 3 4 i 1] nw ‘ ki | 
Otome ..07 il 
»07—2018 3 
.08.09 7 
“00 = m0 oe. 
C= oy i 
slide 5 HO \ 
{125,03 8 KY 5 6 
~13-.14 DL 26 a 5 9 
ye 15 eal 55 18 12 2 
ee — FS 28 34 19 19 32 
.16-.17 38 Si) al 23 43 
.17-.18 34 32 an 30 59 
pale 3. ie Wy oe) Nak o> 35 
13S 20 lo cul Ho Ho 38 
2021 39 20 50 4O 31 
.21-.22 30 25 iat 36 26 
ee) ee phy 16 28 32 31 
ey 28 14 34 hg eal 
ge SS as 25 Te 31 38 33 
.25-.26 29 8 30 23 eri 
NS 20 6 20 21 20 
Re mens 8 5 15 om 14 
SD Oe 29 18 6 15 Peli 15 
s29=30 areal 4 10 13 15 
.30-.31 fi 3 i, 8 8 
paomee 8 2 fi 9 7 
2 32-. 33 0 2 { 8 ii 
eae. 2 hi 6 fi 7 10 Daal 
wel _ es p 1 2 3 
»35-.36 5} 5 3 5 
ee ey 2 2 2 
37—. 38 i 
Mean* . 2033 1624 Pesical era) , 2oieal 
Variance* .0028 "OG27 0024 OG 27 0028 
NGM che *Sample mean and Sample varidanee of M Statistics. 


2) Each sample size was 25 bivariate vectors. 
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Tepe FETT 


Distribution (Relative Frequency) of 500 Maximum Kolmogorov- 
Smirnov Statistics (M) for Samples Drawn from Various 
bevariate Normal Distributions with Identical Correlation 
Coefficient, o0 





Corr. 
2 : 1 6 
ane : % 
vow. wa T 

Bie 3 
ie 2 8 2 5 
me i 37 18 3 ve 
iG | 31 D7 19 12 
(ie @0 28 oil Ba 2a 
iy 38 Hy 26 33 
ie gi 38 25 16 
. 9 4] 51 39 Dio 
, 20 40 2111 215 29 
aul 39 26 38 a7 
22 30 28 42 3y 
| 2B a a7 45 Zl 
: om 28 il 37 Qe 
as 25 26 23 2u 
26 29 16 30 22 
, i 28 25 22 26 
[26 8 9 10 19 
, 29 18 16 iL 12 
20 | ae) 6 14 16 
el " 9 ial 7 
eet 8 D 12 9 
. 38 0 3 a 3 
ay 6 1 1 2 
. 2 2, Lf 6 
. 36 3 5 4 6 
Pm il. 3 2 5 
38 

Mean* eo@ee |. reese | .2168" .21308 .2061 2120 

ariance”* ao028 @ ,0ge@8 | .0e26) .002m .0028 . .0026 


NOTE: 1) *Mean and variable of M. 
2) Sample size - 25 vectors. 


3) 2 =covariance matrix or distribution of popula- 
tion from which samples were drawn. 
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